Generate HTML output and send it to Slack, make output files downloadable in the web UI #3

nkaretnikov · 2023-11-27T02:18:06Z

Reference Issues or PRs

What does this implement/fix?

Put a x in the boxes that apply

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds a feature)
Breaking change (fix or feature that would cause existing features not to work as expected)
Documentation Update
Code style update (formatting, renaming)
Refactoring (no functional changes, no API changes)
Build related changes
Other (please describe):

Testing

Did you test the pull request locally?
Did you add new tests?

Documentation

Access-centered content checklist

Text styling

The content is written with plain language (where relevant).
If there are headers, they use the proper header tags (with only one level-one header: H1 or # in markdown).
All links describe where they link to (for example, check the Nebari website).
This content adheres to the Nebari style guides.

Non-text content

All content is represented as text (for example, images need alt text, and videos need captions or descriptive transcripts).
If there are emojis, there are not more than three in a row.
Don't use flashing GIFs or videos.
If the content were to be read as plain text, it still makes sense, and no information is missing.

Any other comments?

nkaretnikov · 2023-11-27T02:31:37Z

Summary:

adds a send_to_slack step to scheduled and one-time workflows
it uses the Slack API to send HTML output to a specified Slack channel
added a call to jupyter nbconvert to generate HTML
configured via "Parameters" SLACK_TOKEN and SLACK_CHANNEL in "Notebook Jobs" in the web UI, which are accessible via envs in the code
see the Slack API docs on how to configure a bot to send a file to a channel -- this needs to be done first for the bot/sending functionality to work
this new step is integrated with update_job_status_failure, so it will be visible in the UI if it fails
the Slack script also has some printing and additional validation, so an exception will be raised on failure, which will cause the job to fail
cmd_args generation is changed because (1) two commands are now called there and (2) it's passed to /bin/sh as a string anyway, so no point in keeping that in a list
changed *path functions to return Path objects since that's more flexible, in case callers want to modify these paths.

Notes:

Since there are multiple steps, it takes time to spawn and execute them. This means that configuring "Run on a schedule" with a "Minute" interval is only enough to start the main job and send to Slack, but not to update the status. After that, the whole workflow is restarted. Use a longer interval, for example, */5 * * * * (every 5 mins).
This requires papermill to be part of the environment used when scheduling a job.
The jupyter command is available globally, so it doesn't need to be in the environment.
"Output formats" checkboxes in the UI (Notebook, HTML) do nothing. Both of these formats are always generated.

nkaretnikov · 2023-11-27T02:44:48Z

Testing checklist:

raising an exception in send_to_slack causes the workflow to fail (visible in the UI)
non-zero exit code in command in send_to_slack causes the workflow to fail (visible in the UI)
"Run now" sends to Slack and updates the status in the UI
"Run on a schedule" (*/5 * * * *) sends to Slack and updates the status in the UI (all steps are executed)
When SLACK_TOKEN and SLACK_CHANNEL are not specified, "Run now" works and updates the status in the UI, nothing is sent to Slack
When SLACK_TOKEN and SLACK_CHANNEL are not specified, "Run on a schedule" (*/5 * * * *) works and updates the status in the UI, nothing is sent to Slack
A message is sent to Slack with a valid HTML file, which is generated when the job is run. Easy to validate via:

import datetime
datetime.datetime.now()

argo_jupyter_scheduler/utils.py

dharhas · 2023-11-28T17:05:26Z

So how will this be configured? i.e. "send to slack" is not a feature that all nebari / jupyter-scheduler users will need. Also someone else might want to send it to mattermost or another rest api. Is there a way to make this a bit more generic.

nkaretnikov · 2023-11-28T18:05:01Z

@dharhas

So how will this be configured? i.e. "send to slack" is not a feature that all nebari / jupyter-scheduler users will need.

Currently, it'll only execute this task if you provide SLACK_TOKEN and SLACK_CHANNEL as Parameters when scheduling the notebook. If you don't provide this, nothing will be sent.

Is there a way to make this a bit more generic.

Technically, we can turn this into "specify a random shell command and I'll execute it", but I don't think it's a good design.

Users might run into issues with string escaping
This prevents us from doing API-specific checking of whether the request was successful or not.

I'd suggest we add support for additional APIs separately, on a case by case basis.

argo_jupyter_scheduler/executor.py

argo_jupyter_scheduler/utils.py

dharhas · 2023-12-05T13:37:35Z

@nkaretnikov lets add docs also I think we need to make sure runs are timestamped.

Are they also saved to disk as well as sent to slack? "send to slack" needs to be optional.

nkaretnikov · 2023-12-05T13:55:47Z

Is there an example (a screenshot maybe) of slack output in a channel or something?

Slack previews HTML as source code here. I think they don't render it by default for security reasons. I've looked and I'm not sure there's a way to render it. Once you download it, it's valid HTML.

- These files correspond to "Output formats" on the "Create Job" page and have timestamps matching `job.create_time` - jupyter-scheduler looks for these when you download files via "Output files" on the "Notebook Jobs" page, download fails otherwise - See `create_output_filename` and `get_staging_paths` in jupyter-scheduler.

argo_jupyter_scheduler/executor.py

argo_jupyter_scheduler/scheduler.py

argo_jupyter_scheduler/utils.py

README.md

The actual value can also be 1969-12-31-06-00-00-PM, so remove the exact date to avoid confusion.

nkaretnikov · 2023-12-09T15:05:00Z

Manually tested everything as of this commit (3a42dd0):

went through this checklist again and re-tested everything: Generate HTML output and send it to Slack, make output files downloadable in the web UI #3 (comment)
checked that files are downloadable
checked that no extra files are deleted due to changes in the deletion logic
workflows are executed even without jupyterlab running
when an exception is raised in a notebook, you can still download it and the workflow is set to failed, the exception is printed in the notebook

nkaretnikov · 2023-12-09T18:13:42Z

@aktech I've tested and reviewed this. PTAL

aktech

Hey @nkaretnikov

Thanks for taking another pass at this, I am having hard time understanding the flow and reasoning for various things here even after reading some of the notes you have, maybe because it's scattered here and there and probably because of my lack of understanding of argo workflows. I see you have added usage documentation, can you please add a short working comment explaining step by step what's happening here architecturally, like e.g.:

We schedule a job from papermill with parameters x, y
Then we create a job for running the given notebook via Argo workflows...
Then we rename files because of reason m, n, o, etc... and that's the most apt way because of reason j, k, l
....

We can then put this doc in the code itself as well, since not everyone contributing would be very familiar with Argo workflows.

aktech · 2023-12-11T10:01:38Z

argo_jupyter_scheduler/executor.py

+
+    try:
+        # Sets up logging
+        logger = setup_logger("rename_files")


This shadows name logger from outer scope.

Nope, that's on purpose. I've just re-tested to be sure, too. The global logger name is not accessible here. Things used in these scripts need to be local to them because they'll be running as separate pods. That's why they have these local imports. And you also cannot pass arbitrary Python objects as parameters - only very basic serializable things, like strings and dicts.

aktech · 2023-12-11T10:12:04Z

argo_jupyter_scheduler/executor.py

+
+    except Exception as e:
+        msg = "Failed to rename files"
+        logger.info(msg)


Use logger.exception log full traceback.

aktech · 2023-12-11T10:14:41Z

argo_jupyter_scheduler/executor.py

+    try:
+        # Sets up logging
+        logger = setup_logger("rename_files")
+        add_file_logger(logger, log_path)


Why do these two lines are inside try block?

If there is an exception here then you'd not have the logger variable in the except block anyway.

aktech · 2023-12-11T10:28:37Z

argo_jupyter_scheduler/executor.py

+                )
+
+                failure += " || {{steps.rename-files.status}} == Failed"
+                successful += " && {{steps.rename-files.status}} == Succeeded"


The start_time value needs to be generated within that step. But we cannot pass the start_time value directly between these steps. Instead, we use the database.

What do you mean by generated? Isn't start_time the time of start of the job and not some thing that's generated?

nkaretnikov · 2023-12-18T03:19:37Z

@aktech PTAL. Made the changes you suggested, added more info to the internals section of README. Tested to make sure it's working and the backtraces are logged to a file.

nkaretnikov · 2024-01-11T03:22:19Z

I went ahead and merged this since it'd be nice to have as part of the current Nebari release, see nebari-dev/nebari#2195 (comment).

Generate HTML output and send it to Slack

7419b2d

nkaretnikov commented Nov 27, 2023

View reviewed changes

argo_jupyter_scheduler/utils.py Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

nkaretnikov marked this pull request as ready for review December 4, 2023 02:24

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

aktech requested changes Dec 5, 2023

View reviewed changes

This comment was marked as resolved.

Sign in to view

nkaretnikov added 15 commits December 5, 2023 15:09

Remove redundant format string

5bbf836

Use parameters instead of iterating over envs

5692f51

Use requests in send_to_slack

a1c8c59

Use variables in the papermill command

7a4e650

Fix linting issues

243c7ad

Use os.path.basename since script args are serialized

d212f09

Account for parameters being None

eb57d45

Add docs on how to send to Slack

3ff0866

Log Slack script output to a file

27eadc9

Add missing import

d8a4624

Try passing logger to the Slack script

ebdff98

Try using a local logger since it cannot be an arg

be83ac4

Fix the import

b5ac5f9

Log staging_paths

060d837